Overview

Dataset Statistics

Number of Variables 19
Number of Rows 150000
Missing Cells 135771
Missing Cells (%) 4.8%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 59.0 MB
Average Row Size in Memory 412.1 B
Variable Types
  • Categorical: 7
  • Numerical: 11
  • GeoGraphy: 1

Dataset Insights

b has 12984 (8.66%) missing values Missing
c has 12984 (8.66%) missing values Missing
o has 108857 (72.57%) missing values Missing
c is skewed Skewed
d is skewed Skewed
e is skewed Skewed
f is skewed Skewed
h is skewed Skewed
m is skewed Skewed
monto is skewed Skewed
score is skewed Skewed
g has a high cardinality: 51 distinct values High Cardinality
j has a high cardinality: 8324 distinct values High Cardinality
fecha has a high cardinality: 145813 distinct values High Cardinality
a has constant length 1 Constant Length
g has constant length 2 Constant Length
j has constant length 11 Constant Length
n has constant length 1 Constant Length
o has constant length 1 Constant Length
p has constant length 1 Constant Length
fecha has constant length 19 Constant Length
fraude has constant length 1 Constant Length
e has 65055 (43.37%) zeros Zeros
f has 25390 (16.93%) zeros Zeros
h has 12840 (8.56%) zeros Zeros
m has 15898 (10.6%) zeros Zeros
  • 1
  • 2
  • 3

Variables


a

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9900000
  • The largest value (4) is over 8.94 times larger than the second largest value (2)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 4
2nd row 4
3rd row 4
4th row 4
5th row 4

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 150000
  • The top 2 categories (4, 2) take over 50.0%
  • The largest value (4) is over 8.94 times larger than the second largest value (2)
  • a has words of constant length

b

numerical

Approximate Distinct Count 7672
Approximate Unique (%) 5.6%
Missing 12984
Missing (%) 8.7%
Infinite 0
Infinite (%) 0.0%
Memory Size 2192256
Mean 0.7281
Minimum 0
Maximum 1
Zeros 472
Zeros (%) 0.3%
Negatives 0
Negatives (%) 0.0%
  • b is skewed left (γ1 = -1.6253)

Quantile Statistics

Minimum 0
5-th Percentile 0.4893
Q1 0.6784
Median 0.7555
Q3 0.8065
95-th Percentile 0.887
Maximum 1
Range 1
IQR 0.1281

Descriptive Statistics

Mean 0.7281
Standard Deviation 0.1329
Variance 0.01767
Sum 99763.4398
Skewness -1.6253
Kurtosis 5.1881
Coefficient of Variation 0.1826
  • b is not normally distributed (p-value 1.6009744270741092e-05)
  • b has 8554 outliers

c

numerical

Approximate Distinct Count 135090
Approximate Unique (%) 98.6%
Missing 12984
Missing (%) 8.7%
Infinite 0
Infinite (%) 0.0%
Memory Size 2192256
Mean 260445.107
Minimum 0.16
Maximum 1.3879e+07
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • c is skewed right (γ1 = 6.7243)

Quantile Statistics

Minimum 0.16
5-th Percentile 550.1225
Q1 9679.915
Median 43711.655
Q3 145443.6275
95-th Percentile 1.032e+06
Maximum 1.3879e+07
Range 1.3879e+07
IQR 135763.7125

Descriptive Statistics

Mean 260445.107
Standard Deviation 846436.1416
Variance 7.1645e+11
Sum 3.5685e+10
Skewness 6.7243
Kurtosis 57.3665
Coefficient of Variation 3.25
  • c is not normally distributed (p-value 5.829319991925633e-25)
  • c has 16946 outliers

d

numerical

Approximate Distinct Count 51
Approximate Unique (%) 0.0%
Missing 365
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 2394160
Mean 21.6777
Minimum 0
Maximum 50
Zeros 786
Zeros (%) 0.5%
Negatives 0
Negatives (%) 0.0%
  • d is skewed right (γ1 = 0.4221)

Quantile Statistics

Minimum 0
5-th Percentile 1
Q1 2
Median 14
Q3 50
95-th Percentile 50
Maximum 50
Range 50
IQR 48

Descriptive Statistics

Mean 21.6777
Standard Deviation 20.0621
Variance 402.4897
Sum 3.2437e+06
Skewness 0.4221
Kurtosis -1.5242
Coefficient of Variation 0.9255
  • d is not normally distributed (p-value 3.836804398696798e-18)

e

numerical

Approximate Distinct Count 43208
Approximate Unique (%) 28.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 0.2206
Minimum 0
Maximum 833.3333
Zeros 65055
Zeros (%) 43.4%
Negatives 0
Negatives (%) 0.0%
  • e is skewed right (γ1 = 281.2206)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0.09519
Q3 0.2829
95-th Percentile 0.7112
Maximum 833.3333
Range 833.3333
IQR 0.2829

Descriptive Statistics

Mean 0.2206
Standard Deviation 2.435
Variance 5.9292
Sum 33096.1756
Skewness 281.2206
Kurtosis 92777.126
Coefficient of Variation 11.036
  • e is not normally distributed (p-value 4.226540303380305e-25)
  • e has 7580 outliers

f

numerical

Approximate Distinct Count 1338
Approximate Unique (%) 0.9%
Missing 11
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2399824
Mean 51.1694
Minimum -5
Maximum 145274
Zeros 25390
Zeros (%) 16.9%
Negatives 1277
Negatives (%) 0.9%
  • f is skewed right (γ1 = 126.1817)

Quantile Statistics

Minimum -5
5-th Percentile 0
Q1 1
Median 8
Q3 33
95-th Percentile 162
Maximum 145274
Range 145279
IQR 32

Descriptive Statistics

Mean 51.1694
Standard Deviation 709.4729
Variance 503351.8019
Sum 7.6748e+06
Skewness 126.1817
Kurtosis 22063.8312
Coefficient of Variation 13.8652
  • f is not normally distributed (p-value 4.226583696676785e-25)
  • f has 17116 outliers

g

categorical

Approximate Distinct Count 51
Approximate Unique (%) 0.0%
Missing 194
Missing (%) 0.1%
Memory Size 10037002
  • The largest value (BR) is over 3.49 times larger than the second largest value (AR)

Length

Mean 2
Standard Deviation 0
Median 2
Minimum 2
Maximum 2

Sample

1st row AR
2nd row AR
3rd row BR
4th row BR
5th row BR

Letter

Count 299612
Lowercase Letter 0
Space Separator 0
Uppercase Letter 299612
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (BR, AR) take over 50.0%
  • The largest value (br) is over 3.49 times larger than the second largest value (ar)
  • g has words of constant length

h

numerical

Approximate Distinct Count 59
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 14.1935
Minimum 0
Maximum 58
Zeros 12840
Zeros (%) 8.6%
Negatives 0
Negatives (%) 0.0%
  • h is skewed right (γ1 = 1.196)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 3
Median 9
Q3 21
95-th Percentile 46
Maximum 58
Range 58
IQR 18

Descriptive Statistics

Mean 14.1935
Standard Deviation 14.1612
Variance 200.54
Sum 2.129e+06
Skewness 1.196
Kurtosis 0.6032
Coefficient of Variation 0.9977
  • h is not normally distributed (p-value 1.7454508700446156e-16)
  • h has 5045 outliers

j

categorical

Approximate Distinct Count 8324
Approximate Unique (%) 5.5%
Missing 0
Missing (%) 0.0%
Memory Size 11400000

Length

Mean 11
Standard Deviation 0
Median 11
Minimum 11
Maximum 11

Sample

1st row cat_d26ab52
2nd row cat_ea962fb
3rd row cat_4c2544e
4th row cat_1b59ee3
5th row cat_9bacaa5

Letter

Count 856377
Lowercase Letter 856377
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 643623
  • j contains many words: 8324 words
  • j has words of constant length

k

numerical

Approximate Distinct Count 150000
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 0.4975
Minimum 4.18e-06
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • k is skewed right (γ1 = 0.0104)

Quantile Statistics

Minimum 4.18e-06
5-th Percentile 0.04933
Q1 0.2468
Median 0.496
Q3 0.7465
95-th Percentile 0.9493
Maximum 1
Range 1
IQR 0.4997

Descriptive Statistics

Mean 0.4975
Standard Deviation 0.2883
Variance 0.08314
Sum 74629.8742
Skewness 0.01044
Kurtosis -1.1974
Coefficient of Variation 0.5796

l

numerical

Approximate Distinct Count 7297
Approximate Unique (%) 4.9%
Missing 11
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2399824
Mean 2305.4094
Minimum 0
Maximum 7544
Zeros 2030
Zeros (%) 1.4%
Negatives 0
Negatives (%) 0.0%
  • l is skewed right (γ1 = 0.6812)

Quantile Statistics

Minimum 0
5-th Percentile 111
Q1 910
Median 1937
Q3 3445
95-th Percentile 5552
Maximum 7544
Range 7544
IQR 2535

Descriptive Statistics

Mean 2305.4094
Standard Deviation 1712.3796
Variance 2.9322e+06
Sum 3.4579e+08
Skewness 0.6812
Kurtosis -0.3719
Coefficient of Variation 0.7428
  • l has 384 outliers

m

numerical

Approximate Distinct Count 1793
Approximate Unique (%) 1.2%
Missing 365
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 2394160
Mean 299.9696
Minimum 0
Maximum 2225
Zeros 15898
Zeros (%) 10.6%
Negatives 0
Negatives (%) 0.0%
  • m is skewed right (γ1 = 1.377)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 42
Median 193
Q3 459
95-th Percentile 967
Maximum 2225
Range 2225
IQR 417

Descriptive Statistics

Mean 299.9696
Standard Deviation 321.0758
Variance 103089.6733
Sum 4.4886e+07
Skewness 1.377
Kurtosis 1.7499
Coefficient of Variation 1.0704
  • m is not normally distributed (p-value 2.3730136065469373e-18)
  • m has 4499 outliers

n

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9900000
  • The largest value (1) is over 9.24 times larger than the second largest value (0)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 150000
  • The top 2 categories (1, 0) take over 50.0%
  • The largest value (1) is over 9.24 times larger than the second largest value (0)
  • n has words of constant length

o

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 108857
Missing (%) 72.6%
Memory Size 2715438

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row Y
2nd row Y
3rd row N
4th row Y
5th row N

Letter

Count 41143
Lowercase Letter 0
Space Separator 0
Uppercase Letter 41143
Dash Punctuation 0
Decimal Number 0
  • o has words of constant length

p

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9900000

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row Y
2nd row Y
3rd row Y
4th row Y
5th row N

Letter

Count 150000
Lowercase Letter 0
Space Separator 0
Uppercase Letter 150000
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Y, N) take over 50.0%
  • p has words of constant length

fecha

categorical

Approximate Distinct Count 145813
Approximate Unique (%) 97.2%
Missing 0
Missing (%) 0.0%
Memory Size 12600000

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2020-03-20 09:28:1...
2nd row 2020-03-09 13:58:2...
3rd row 2020-04-08 12:25:5...
4th row 2020-03-14 11:46:1...
5th row 2020-03-23 14:17:1...

Letter

Count 0
Lowercase Letter 0
Space Separator 150000
Uppercase Letter 0
Dash Punctuation 300000
Decimal Number 2100000
  • fecha contains many words: 60590 words
  • fecha has words of constant length

monto

numerical

Approximate Distinct Count 17831
Approximate Unique (%) 11.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 43.5231
Minimum 0.02
Maximum 3696.35
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • monto is skewed right (γ1 = 9.7338)

Quantile Statistics

Minimum 0.02
5-th Percentile 3.89
Q1 9.38
Median 20.61
Q3 40.6925
95-th Percentile 158.31
Maximum 3696.35
Range 3696.33
IQR 31.3125

Descriptive Statistics

Mean 43.5231
Standard Deviation 91.5579
Variance 8382.8468
Sum 6.5285e+06
Skewness 9.7338
Kurtosis 180.5426
Coefficient of Variation 2.1037
  • monto is not normally distributed (p-value 5.969587820612381e-25)
  • monto has 14823 outliers

score

numerical

Approximate Distinct Count 101
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 2400000
Mean 48.0662
Minimum 0
Maximum 100
Zeros 4969
Zeros (%) 3.3%
Negatives 0
Negatives (%) 0.0%
  • score is skewed right (γ1 = 0.016)

Quantile Statistics

Minimum 0
5-th Percentile 3
Q1 23
Median 48
Q3 73
95-th Percentile 93
Maximum 100
Range 100
IQR 50

Descriptive Statistics

Mean 48.0662
Standard Deviation 28.9951
Variance 840.7171
Sum 7.2099e+06
Skewness 0.01597
Kurtosis -1.1831
Coefficient of Variation 0.6032
  • score is not normally distributed (p-value 1.6819130386489164e-18)

fraude

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9900000
  • The largest value (0) is over 19.0 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 150000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 19.0 times larger than the second largest value (1)
  • fraude has words of constant length

Interactions

Correlations

Missing Values